68 research outputs found
cis-Diaquabis(2,2′,2′′-tripyridylamine)zinc(II) bis(perchlorate)
In the title compound, [Zn(2,2′,2′′-tpa)2(H2O)2](ClO4)2 (2,2′,2′′-tpa is 2,2′,2′′-tripyridylamine, C15H12N4), the Zn center lies on a twofold axis and is coordinated octahedrally by two water molecules and two bidentate 2,2′,2′′-tpa ligands. The perchlorate anions are linked to the coordinated water molecules in the complex cations via O—H⋯O hydrogen bonds
A Hierarchical Spatio-Temporal Graph Convolutional Neural Network for Anomaly Detection in Videos
Deep learning models have been widely used for anomaly detection in
surveillance videos. Typical models are equipped with the capability to
reconstruct normal videos and evaluate the reconstruction errors on anomalous
videos to indicate the extent of abnormalities. However, existing approaches
suffer from two disadvantages. Firstly, they can only encode the movements of
each identity independently, without considering the interactions among
identities which may also indicate anomalies. Secondly, they leverage
inflexible models whose structures are fixed under different scenes, this
configuration disables the understanding of scenes. In this paper, we propose a
Hierarchical Spatio-Temporal Graph Convolutional Neural Network (HSTGCNN) to
address these problems, the HSTGCNN is composed of multiple branches that
correspond to different levels of graph representations. High-level graph
representations encode the trajectories of people and the interactions among
multiple identities while low-level graph representations encode the local body
postures of each person. Furthermore, we propose to weightedly combine multiple
branches that are better at different scenes. An improvement over single-level
graph representations is achieved in this way. An understanding of scenes is
achieved and serves anomaly detection. High-level graph representations are
assigned higher weights to encode moving speed and directions of people in
low-resolution videos while low-level graph representations are assigned higher
weights to encode human skeletons in high-resolution videos. Experimental
results show that the proposed HSTGCNN significantly outperforms current
state-of-the-art models on four benchmark datasets (UCSD Pedestrian,
ShanghaiTech, CUHK Avenue and IITB-Corridor) by using much less learnable
parameters.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video
Technology (T-CSVT
Aggregation signature for small object tracking
Small object tracking becomes an increasingly important task, which however
has been largely unexplored in computer vision. The great challenges stem from
the facts that: 1) small objects show extreme vague and variable appearances,
and 2) they tend to be lost easier as compared to normal-sized ones due to the
shaking of lens. In this paper, we propose a novel aggregation signature
suitable for small object tracking, especially aiming for the challenge of
sudden and large drift. We make three-fold contributions in this work. First,
technically, we propose a new descriptor, named aggregation signature, based on
saliency, able to represent highly distinctive features for small objects.
Second, theoretically, we prove that the proposed signature matches the
foreground object more accurately with a high probability. Third,
experimentally, the aggregation signature achieves a high performance on
multiple datasets, outperforming the state-of-the-art methods by large margins.
Moreover, we contribute with two newly collected benchmark datasets, i.e.,
small90 and small112, for visually small object tracking. The datasets will be
available in https://github.com/bczhangbczhang/.Comment: IEEE Transactions on Image Processing, 201
PVD-AL: Progressive Volume Distillation with Active Learning for Efficient Conversion Between Different NeRF Architectures
Neural Radiance Fields (NeRF) have been widely adopted as practical and
versatile representations for 3D scenes, facilitating various downstream tasks.
However, different architectures, including plain Multi-Layer Perceptron (MLP),
Tensors, low-rank Tensors, Hashtables, and their compositions, have their
trade-offs. For instance, Hashtables-based representations allow for faster
rendering but lack clear geometric meaning, making spatial-relation-aware
editing challenging. To address this limitation and maximize the potential of
each architecture, we propose Progressive Volume Distillation with Active
Learning (PVD-AL), a systematic distillation method that enables any-to-any
conversions between different architectures. PVD-AL decomposes each structure
into two parts and progressively performs distillation from shallower to deeper
volume representation, leveraging effective information retrieved from the
rendering process. Additionally, a Three-Levels of active learning technique
provides continuous feedback during the distillation process, resulting in
high-performance results. Empirical evidence is presented to validate our
method on multiple benchmark datasets. For example, PVD-AL can distill an
MLP-based model from a Hashtables-based model at a 10~20X faster speed and
0.8dB~2dB higher PSNR than training the NeRF model from scratch. Moreover,
PVD-AL permits the fusion of diverse features among distinct structures,
enabling models with multiple editing properties and providing a more efficient
model to meet real-time requirements. Project website:http://sk-fun.fun/PVD-AL.Comment: Project website: http://sk-fun.fun/PVD-AL. arXiv admin note:
substantial text overlap with arXiv:2211.1597
Text-driven Editing of 3D Scenes without Retraining
Numerous diffusion models have recently been applied to image synthesis and
editing. However, editing 3D scenes is still in its early stages. It poses
various challenges, such as the requirement to design specific methods for
different editing types, retraining new models for various 3D scenes, and the
absence of convenient human interaction during editing. To tackle these issues,
we introduce a text-driven editing method, termed DN2N, which allows for the
direct acquisition of a NeRF model with universal editing capabilities,
eliminating the requirement for retraining. Our method employs off-the-shelf
text-based editing models of 2D images to modify the 3D scene images, followed
by a filtering process to discard poorly edited images that disrupt 3D
consistency. We then consider the remaining inconsistency as a problem of
removing noise perturbation, which can be solved by generating training data
with similar perturbation characteristics for training. We further propose
cross-view regularization terms to help the generalized NeRF model mitigate
these perturbations. Our text-driven method allows users to edit a 3D scene
with their desired description, which is more friendly, intuitive, and
practical than prior works. Empirical results show that our method achieves
multiple editing types, including but not limited to appearance editing,
weather transition, material changing, and style transfer. Most importantly,
our method generalizes well with editing abilities shared among a set of model
parameters without requiring a customized editing model for some specific
scenes, thus inferring novel views with editing effects directly from user
input. The project website is available at http://sk-fun.fun/DN2NComment: Project Website: http://sk-fun.fun/DN2
- …